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Abstract 

An  accurate  and  efficient  algorithm  for  solving  the  constrained  U-norm  minimization  prob¬ 
lem  is  highly  needed  and  is  crucial  for  the  success  of  sparse  signal  recovery  in  compressive 
sampling.  Most  of  existing  algorithms  in  the  literature  give  an  approximate  solution  to  the 
problem.  We  tackle  the  constrained  U-norm  minimization  problem  by  reformulating  it  via  an 
indicator  function  which  describes  the  constraints.  The  resulting  model  is  solved  efficiently  and 
accurately  by  using  an  elegant  proximity  operator  based  algorithm.  We  establish  convergence 
analysis  of  the  resulting  algorithm.  Numerical  experiments  show  that  the  proposed  algorithm 
performs  well  for  sparse  signals  with  magnitudes  over  a  high  dynamic  range.  Furthermore,  it 
performs  significantly  better  than  the  well-known  algorithm  NESTA  in  terms  of  the  quality  of 
restored  signals  and  the  computational  complexity  measured  in  the  CPU-time  consumed. 


1  Introduction 

In  this  paper,  we  study  recovery  of  an  unknown  vector  uq  G  Mn  from  the  observed  data  b  G  Mm 
and  the  model 

b  =  Auq  +  z,  (1) 

where  A  is  a  known  m  x  n  measurement  matrix  and  z  G  Mm  is  a  noise  term.  Under  an  assumption 
that  the  vector  uq  of  interest  is  sparse,  the  work  in  [4,  5]  shows  that  an  accurate  estimation  of  uq 
is  possible  even  when  m  <  n,  that  is,  the  observations  are  less  than  unknowns.  Recently,  there 
is  a  significant  body  of  work  that  focuses  on  finding  an  approximation  to  uq  by  solving  a  convex 
optimization  problem.  In  the  presence  of  noise-free  data,  i.e.,  z  =  0,  the  optimization  problem  is 

(BP)  min{||it||i  :  u  G  Mn}  subject  to  b  =  Au, 

which  essentially  is  the  basis  pursuit  problem  proposed  early  in  the  context  of  time-frequency  rep¬ 
resentation  [8].  Here,  ||  •  ||i  denotes  the  IJ-norm  of  a  vector  in  an  Euclidean  space.  The  optimization 
model  (BP)  can  be  solved  by  linear  programming. 

In  the  presence  of  noisy  data,  the  linear  constraint  b  =  Au  in  (BP)  is  relaxed  to  an  inequality 
constraint  ||  Au  —  b H2  <  e,  where  ||  •  H2  denotes  the  ^2-norm  of  a  vector  in  an  Euclidean  space.  As  a 
result,  the  optimization  model  (BP)  becomes  the  basis  pursuit  denoising  problem 

(BPe)  min{||u||i  :  u  G  Mn}  subject  to  ||Au  —  6|| 2  <  e, 

‘This  research  is  supported  in  part  by  2012  Air  Force  Visiting  Faculty  Research  Program,  by  an  award  from  the 
NRC  (National  Research  Council)  via  AFSOR,  by  the  US  National  Science  Foundation  under  grant  DMS-1115523. 
^Department  of  Mathematics,  Syracuse  University,  Syracuse,  NY  13244,  USA. 

^Air  Force  Research  Laboratory,  AFRL/RITB,  Rome,  NY  13441-4505.  Email:  Bruce.Suter@rl.af.mil 
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where  e2  is  an  estimated  upper  bound  of  the  noise  power. 

Both  problems  (BP)  and  (BPe)  are  closely  related  to  the  penalized  least-squares  problem 

(QPa)  min  { ~ 1 1 Au  6 1 1 2  +  A||u||i  :  u  <E  Mn|  . 

A  large  amount  of  research  has  been  done  on  solving  problems  (BP),  (BPe),  and  (QPA).  Here,  we 
only  give  a  brief  and  non-exhaustive  review  of  results  for  these  problems.  In  [8],  problems  (BP)  and 
(Q?a)  are  solved  by  first  reformulating  them  as  perturbed  linear  programming  and  then  applying 
a  primal-dual  interior-point  approach  [24],  Recently,  many  iterative  shrinkage/thresholding  algo¬ 
rithms  are  proposed  to  handle  problem  (QPA).  These  include  the  proximal  forward-backward  split¬ 
ting  [9],  the  gradient  projection  for  sparse  reconstruction  [12],  the  FISTA  (fast  iterative  shrinkage¬ 
thresholding  algorithm)  [1],  the  fixed-point  continuation  algorithm  [13],  the  Bregman  iterative  regu¬ 
larization  [3,  25],  and  the  reference  therein.  Problem  (BPe)  also  frequently  appears  in  wavelet-based 
signal/image  restoration  [6,  7]  with  the  matrix  A  associated  with  some  inverse  transforms. 

Problem  (BPe)  can  be  formulated  as  a  second-order  cone  program  and  solved  by  interior-point 
algorithms.  Many  suggested  algorithms  for  (BPe)  are  based  on  repeatedly  solving  (QP^)  for  various 
values  of  A.  Such  algorithms  are  referred  to  as  the  homotopy  method  originally  proposed  in  [11,  20]. 
The  homotopy  method  is  also  successfully  applied  to  (BP)  in  [10].  A  common  approach  for  obtaining 
approximate  solutions  to  (BPe)  is  often  accompanied  by  solving  (QPA)  for  a  decreasing  sequence 
of  values  of  A  [22],  The  optimization  theory  asserts  that  problems  (BPe)  and  (QP^)  are  equivalent 
provided  that  the  parameters  e  and  A  satisfy  certain  relationship  [21].  Since  this  relationship  is 
hard  to  compute  in  general,  solving  problem  (BPe)  via  repeatedly  solving  (QPA)  for  various  values 
of  A  is  problematic.  Recently,  the  NESTA  [2]  which  employs  Nesterov’s  optimal  gradient  method 
was  proposed  for  solving  relaxed  versions  of  (BP)  and  (BPe)  via  Nesterov’s  smoothing  technique 
[18].  Clearly,  the  closeness  of  the  solution  to  the  relaxed  version  of  (BP)  (or  the  relaxed  version  of 
(BPe))  to  the  solution  to  (BP)  (or  (BPe))  is  determined  by  the  level  of  the  closeness  of  the  smoothed 
£i-norm  to  the  £ i -norm  itself.  Certainly,  the  performance  of  these  approaches  depends  on  the  fine 
tuning  of  the  parameter  A  in  (QPA)  or  a  parameter  that  controls  the  degree  of  the  closeness  of  the 
^1-norm  and  its  smoothed  version. 

In  this  paper,  we  consider  solving  problems  (BP)  and  (BPe)  by  a  different  approach.  We  convert 
the  constrained  optimization  problems  to  unified  unconstrained  one  via  an  indicator  function.  The 
corresponding  objective  function  for  the  unconstrained  optimization  problem  is  the  sum  of  the  t\- 
norm  of  the  underlying  signal  u  and  the  indicator  function  of  a  set  in  Rm,  which  is  {0}  for  (BP) 
or  the  e-ball  for  (BPe),  composing  with  the  affine  transformation  Au  —  b.  Non-differentiability 
of  both  the  ^i-norrn  and  the  indicator  of  the  set  imposes  challenges  for  solving  the  associated 
optimization  problem.  Fortunately  enough,  their  proximity  operators  have  explicit  expressions. 
The  solutions  for  the  problem  can  be  viewed  as  fixed-points  of  a  coupled  equation  formed  in  terms 
of  these  proximity  operators.  An  iterative  algorithm  for  finding  the  fixed-points  is  then  developed. 
The  main  advantage  of  this  approach  is  that  solving  (QPA)  or  smoothing  the  l\ -norm  are  no  longer 
necessary.  This  makes  the  proposed  algorithm  attractive  for  solving  (BP)  and  (BPe).  The  efficiency 
of  fixed-point  based  proximity  algorithms  has  been  demonstrated  in  [9,  14,  15,  16]  for  various  image 
processing  models. 

The  rest  of  the  paper  is  organized  as  follows:  In  section  2  we  reformulate  the  A] -norm  minimiza¬ 
tion  problems  (BP)  and  (BPe)  via  an  indicator  function  and  characterize  solutions  of  the  proposed 
model  in  terms  of  fixed-point  equations.  We  also  point  out  the  connection  between  the  proposed 
model  and  (QPa)  through  the  Moreau  envelope.  In  section  3  we  develop  an  algorithm  for  the  re¬ 
sulting  minimization  problem  based  on  the  fixed-point  equations  arising  from  the  characterization 
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of  the  proposed  model.  Numerical  experiments  are  presented  in  section  4.  We  draw  our  conclusions 
in  section  5. 


2  An  ^i-Norm  Optimization  Model  via  an  Indicator  Function 


In  this  section,  we  consider  a  general  optimization  model  that  includes  models  (BP)  and  (BPe)  as 
its  special  cases  and  characterize  solutions  to  the  proposed  model. 

We  begin  with  introducing  our  notation  and  recalling  necessary  background  from  convex  anal¬ 
ysis.  For  the  usual  d-dimensional  Euclidean  space  denoted  by  Rrf  we  define  (x,y)  :=  Yli=i  xiVii  f°r 
x,y  £  Rd,  the  standard  inner  product  in  Rd.  The  class  of  all  lower  semicontinuous  convex  functions 
/  :  Rd  — >  (— oo,+oo]  such  that  dom /  :=  {x  £  Rd  :  f(x)  <  +oo}  /  0  is  denoted  by  Fo(Rd).  The 
indicator  function  of  a  closed  convex  set  C  in  Rrf  is  defined,  at  u  £  Rrf,  as 

,  ,  f  0,  if  u  £  C, 
trW  '■=  \  , 

(  Too,  otherwise. 

Clearly,  the  indicator  function  lq  is  in  To(Md)  for  any  closed  nonempty  convex  set  C.  In  particular, 
we  define  a  ball  in  Rm  centered  at  the  origin  with  radius  e  as  Be  :=  {u  :  v  £  Rm  and  ||u||2  <  e}. 
Given  a  matrix  A  £  Rmxn  and  a  vector  b  £  Rm,  we  consider  the  following  optimization  problem 

min{||u||i  T  LBe{Au  —  b)  :  u  £  Rn}.  (2) 

We  can  easily  see  that  if  e  =  0  then  model  (2)  reduces  to  (BP)  and  if  e  >  0  then  model  (2)  reduces 
to  (BPe).  In  other  words,  the  both  constrained  optimization  problems  (BP)  and  (BPe)  can  be 
unified  as  the  unconstrained  optimization  problem  (2)  via  the  indicator  function  lbc- 

In  the  following,  we  shall  focus  on  characterizing  solutions  of  model  (2)  using  fixed-point  equa¬ 
tions.  To  characterize  solutions  of  model  (2),  we  first  need  two  concepts,  namely,  the  proximity 
operator  and  subdifferential  of  functions  in  Fo(Mci).  For  a  function  /  £  To(Mrf),  the  proximity 
operator  of  /  with  parameter  A,  denoted  by  pro xAy,  is  a  mapping  from  Rd  to  itself,  defined  for  a 
given  point  x  £  M.d  by  proxAy(x)  :=  argmin  { —  x\\%  T  f(u)  :  u  £  M6*}  .  The  subdifferential  of  a 
proper  convex  function  i/j  £  Fo(Mrf)  at  a  given  vector  u  £  is  the  set  defined  by 

di/j(u )  :=  {v  :  v  £  Md  and  ip(w)  >  ip(u)  +  (v,  w  —  u)  for  all  w  £  Rd}. 

The  subdifferential  and  the  proximity  operator  of  the  function  ^  are  related  in  the  following  way 
(see,  e.g.  [15]):  For  u  in  the  domain  of  ijj  and  v  £  Rd 

v  £  d^(u)  if  and  only  if  u  =  pro xAtt  T  v).  (3) 

Now,  with  the  help  of  the  subdifferential  and  the  proximity  operator,  we  can  characterize  a 
solution  of  the  indicator  function  based  model  (2)  via  fixed-point  equations. 


Proposition  2.1  Let  e  be  a  nonnegative  number,  let  Be  be  the  ball  in  Rm  centered  at  the  origin 
with  radius  e,  let  b  be  a  point  in  Rm,  and  let  A  be  an  m  x  n  matrix.  If  u  £  Rn  is  a  solution  to 
model  (2),  then  for  any  a  >  0  and  f3  >  0  there  exists  a  vector  v  £  Rm  such  that 


u  = 

v  = 

Conversely,  if  there  exist  a  >  0,  u  £  R 
solution  of  model  (2). 


P  aT„ 


Proxii||jl||1  |  u  —  —A  '  v  I  ,  (4) 

(/-proxtfle(._6))(Au  +  i;).  (5) 

n,  and  v  £  Rm  satisfying  equations  (4)  and  (5),  then  u  is  a 
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Proof:  We  first  assume  that  u  £  in  is  a  solution  of  model  (2).  Set  p  :=  lbc( •  —  b).  Hence  Au  —  b 
must  be  in  the  ball  Be.  Therefore,  both  sets  c?||  •  ||i(u)  and  dp(Au)  are  nonempty.  By  Fermat’s 
rule  we  have  that  0  £  9||  •  ||i(u)  +  AT  dp(Au).  Therefore,  for  any  a  >  0  and  /3  >  0  there  exist 
w  £  P<9||  •  ||i (it)  and  v  £  ^dp(Au)  such  that  0  =  aw  +  /3ATv ,  i.e. ,  w  =  —^AJv.  By  using  (3), 
inclusion  w  £  P<9||  •  ||i(u)  implies  u  =  proxxn.^  (u  +  w),  which  is  equation  (4).  Since  =  p  for 

any  f3  >  0,  inclusion  v  £  jjdp(Au)  leads  to  Au  =  prox(/3(i;  +  Au),  which  is  equivalent  to  (5). 

Conversely,  if  equations  (4)  and  (5)  are  satisfied  for  some  a  >  0,  /3  >  0,  u  £  Rn,  and  v  £  Rm, 
using  (3)  again,  we  have  that  —  ^ATv  £  <9(^||  •  ||i)  (it)  and  v  £  dp(Au).  Since  <9(P||  •  ||i)  (it)  = 
P<9||  •  ||i (it)  and  dp(Au)  =  ftdp(Au),  we  know  from  the  above  that  0  £  <9||  •  ||i(u)  +  AT dp(Au). 
This  indicates  that  it  is  a  solution  of  model  (2).  The  proof  is  complete.  □ 

We  remark  that  the  above  fixed-point  characterization  can  be  identified  as  a  special  case  of 
Proposition  1  in  [16].  We  include  the  proof  of  Proposition  2.1  here  for  making  the  paper  self- 
contained. 

The  proximity  operators  of  the  functions  ||  •  ||i  and  i#£(-  —  b)  involved  in  the  characterization  can 
be  computed  efficiently.  Indeed,  the  proximity  operator  proxii.  ,,  is  the  soft-thresholding  operator 

C*  II  111 

defined  for  u  £  M”  by: 

^prox^n  i^  (u)^  [i]  =  max  |  |u[i]|  — — ,  0 1  sign(u[i]),  for  i  =  1,  2, . . . ,  n.  (6) 

The  proximity  operator  proxtB  is  given  by  the  following  lemma. 

Lemma  2.2  Let  e  be  a  nonnegative  number,  let  Be  be  the  ball  in  Rm  centered  at  the  origin  with 
radius  e,  let  b  be  a  point  in  Mm.  Then  for  a  given  v  £  Rm 

=  b  +  min  jl,  -(y-b). 

Proof:  By  the  definition  of  the  proximity  operator,  we  can  verify  directly  that  proxts  = 
b  +  pro xt  (•  —  b)  and  proxts  is  the  projection  operator  onto  the  ball  Be.  The  result  of  this  lemma 
follows  immediately.  □ 

We  close  this  section  by  making  a  connection  between  the  proposed  model  (2)  and  model 
(QPa)  via  the  Moreau  envelope  introduced  in  [17].  Recall  that  the  Moreau  envelope  of  a  function 
/  £  ro(Rd)  with  parameter  p  at  x  £  Rd  is  env^/(x)  :=  min{^||i/  —  rc|||  +  f(y)  :  y  £  Rd},  which  is 
also  in  To(Rd).  In  particular,  the  Moreau  envelope  of  the  indicator  function  lb€  with  parameter  p 
is  given  in  the  following  result. 

Lemma  2.3  Let  e  be  a  nonnegative  number,  let  p  be  a  positive  number,  let  Be  be  the  ball  in  Rm 
centered  at  the  origin  with  radius  e.  Then  env/JitBf  (x)  =  j-  (max{||x||2  —  e,  0})2,  for  any  x  £  Rm. 

Proof:  By  the  definition  of  the  Moreau  envelope,  we  have  that  env/i(,B  (x)  is  the  square  of  the 
distance  from  the  point  x  to  the  ball  Be  then  scaled  by  the  factor  Since  this  distance  is  either 
0  if  x  is  inside  the  ball  or  ||x||2  —  e  if  x  is  outside  the  ball,  our  result  follows  immediately.  □ 

As  an  immediate  consequence  of  Lemma  2.3  we  have  that  envAt{0}  =  ^||  ■  |||.  Hence,  problem 
(Q?a)  can  be  rewritten  in  terms  of  the  Moreau  envelope  as 

min  j||u||i  +  envAt{0}  (Au  —  b)  :  u  £  Rn|  .  (7) 
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In  recent  literature,  model  (7)  (i.e.,  (QPA))  is  commonly  adopted  for  finding  an  approximate  solution 
to  both  problems  (BP)  and  (BPe).  Note  that 


iBe  (x)  -  en vAtSe  (a;)  =  tBe  {x) 


and 


iBe  (x)  -  envAt{0}  (x) 


~^x\\x\\l,  if  x  £  Be] 

+oo,  otherwise. 


These  identities  clearly  indicate  that  envAtJ3f  approximates  tBe  better  than  envAt{0, .  Therefore,  we 
comment  that  instead  of  model  (7)  the  following  model 


min  { 1 1  rz.  1 1 1  +  envAtSe  {Au  —  b)  :  u  G  Mn}  (8) 

would  be  more  suitable  for  finding  an  approximate  solution  to  problem  (BPe).  In  other  words, 
(QPa)  used  in  many  algorithms  (e.g.,  [11,  20,  22,  23])  for  (BPe)  should  be  replaced  by  model  (8). 
By  doing  so,  an  improvement  for  solving  (BPe)  should  be  expected  if  the  noise  power  e2  can  be 
estimated  in  advance.  This  will  be  studied  in  our  future  project.  Unlike  models  (7)  and  (8)  we 
solve  model  (2)  directly  without  introducing  the  parameter  A. 


3  An  Algorithm  and  Its  Convergence 

In  this  section,  we  develop  an  algorithm  for  finding  a  solution  of  model  (2)  and  provide  a  convergence 
analysis  for  the  developed  algorithm. 

As  we  already  know,  all  solutions  of  model  (2)  should  satisfy  the  fixed-point  equations  given 
by  (4)  and  (5).  Our  proposed  algorithm  for  model  (2)  is  derived  based  on  the  following  equivalent 
form  of  equations  (4)  and  (5) 

u  =  proxi|M|i  ((7  -  | Ata)  u  -  ^AT(v  -  i v)J  , 

*  w  =  pro xtBe(._fe)(Au  +  v),  (9) 

v  =  Au  +  v  —  w. 

Based  on  the  above  fixed-point  equations  in  terms  of  u,  w,  and  v,  for  arbitrary  initial  vectors 
u°  G  Mra,  w°,v°  G  Rm,  we  generate  the  sequence  {uk  :  k  G  No}  by  the  following  iterative  scheme 

uk+i  =  proXj_||  ||^  uk  —  ^AT (vk  —  tcfc)^  , 

<  wk+ 1  _  proxtB  (,_6)(Aufc+1  +  vk),  (10) 

vk+ 1  =  Auk+l  +  vk  —  wk+1, 

where  No  :=  {0, 1, . . .}. 

To  show  convergence  of  the  iterative  scheme  (10),  we  recall  a  result  from  [14]. 

Lemma  3.1  (Theorem  3.5  in  [14])  If  x  is  a  vector  in  A  is  an  mxn  matrix,  ip  is  in  Po(Mm) 
and  a,  (3,  A  are  positive  numbers  such  that  <  jp]p?  then  the  sequence  {uk  :  k  G  No}  generated 
by  the  following  iterative  scheme 

uk+i  _  x  _|_  proXi ||  ||^  uk  —  x  —  j^AT(vk  —  wk)^j  , 

<  wk+l  =  proxi¥3(Aufc+1  +  vk),  (11) 

yk+l  _  Auk+1  _)_  yk  _  yjk- 1-1 
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converges  to  a  solution  of  the  optimization  problem 


min{A||rt  —  x||i  +  tp  o  A(u)  :  u  £  Mm}. 


(12) 


With  the  help  of  Lemma  3.1,  the  following  result  shows  that  under  appropriate  conditions  on 
parameters  a  and  f3  the  sequence  {uk  :  k  £  No}  converges  to  a  solution  of  model  (2). 


Theorem  3.2  Let  e  be  a  nonnegative  number,  let  Be  be  the  ball  in  Mm  centered  at  the  origin  with 
radius  e,  let  b  be  a  point  in  Mm,  and  let  A  be  an  m  x  n  matrix.  If 


-  <  1 

a  <  \\A\\*' 


(13) 


then  for  arbitrary  initial  vectors  u°  £  Mn,  w°,v°  £  Rm;  the  sequence  {uk  :  k  £  No}  generated  by  the 
iterative  scheme  (10)  converges  to  a  solution  of  model  (2). 

Proof:  By  setting  x  =  0  and  A  =  1  and  identifying  ip  =  ibX'  ~  &)  in  model  (12),  the  iterative 
scheme  (10)  can  be  viewed  as  a  special  case  of  the  one  given  in  (11).  The  desired  result  follows 
immediately  from  Lemma  3.1.  □ 


The  convergence  result  given  by  Theorem  3.2  offers  a  practical  way  to  find  a  solution  of 
model  (2).  Since  the  explicit  forms  of  the  proximity  operators  prox.1  and  proxtfi  ^._b^  are  given 
by  (6)  and  Lemma  2.2,  respectively,  based  on  Theorem  3.2  a  unified  approach  for  solving  both 
(BP)  and  (BPe)  is  depicted  in  Algorithm  1. 


Algorithm  1  (The  iterative  scheme  for  model  (BPe)  with  e  >  0) 


Initialization:  v°  £  Rm,  w°  £  Rm,  u°  £  Rn,  e  >  0,  a  >  0,  and  j3  >  0  with  ^  <  -pp-' 
repeat  (A;  >  0) 


Step  1:  uk+1  <-  proxi  |M|i  -  ^AT  A^j  uk  -  ^AT (vk  -  wk)^j 


Step  2:  w 


fc+i 


Auk+l  +  vk, 


b  +  e\\A$S$-b\\2’  otherwise. 


if  ||Akifc+1  +  vk  —  6|| 2  <  e; 


Step  3:  vk+1  <-  Auk+1  +  vk  -  wk+1 
until  a  given  stopping  criteria  is  met 


We  make  some  comments  on  Algorithm  1.  Step  1  of  computing  uk+1  is  from  the  first  equation 
in  (10);  Step  2  of  computing  wk+l  is  from  the  second  equation  in  (10)  and  Lemma  2.2;  Step  3  of 
computing  vk+l  is  exactly  the  same  as  the  last  equation  in  (10).  This  algorithm  can  be  presented 
in  a  more  computationally  efficient  way  by  combining  Step  2  and  Step  3  together  and  eliminating 
the  intermediate  variable  wk.  The  motivation  comes  from  the  observation  Auk  —  wk  =  vk  —  vk 
which  is  due  to  the  third  step  of  Algorithm  1.  Substituting  Auk  —  wk  in  Step  1  by  vk  —  vk~l  yields 
that 

uk+1  =  proxi  |M|i  (uk  -  ^At( 2vk  -  u^"1)^  ,  (14) 

with  an  assumption  v~l  =  —  ( Au°  —  w°)  for  given  u°,  w°,  and  u°.  We  can  further  substitute 

wk+1  computed  in  Step  2  into  Step  3.  In  this  way,  the  intermediate  variable  wk  is  no  longer 
needed.  Hence,  these  simplifications  yield  Algorithm  2,  a  variant  of  Algorithm  1.  When  e  =  0, 
all  vectors  wk+1  in  Algorithm  1  are  equal  to  the  constant  vector  b  for  all  k  >  0.  Because  of  this, 
we  would  like  to  set  w°  =  b  in  both  Algorithms  1  and  2.  Finally,  it  is  more  efficient  to  update 
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uk+l  with  Step  1  of  Algorithm  2  than  Step  1  of  Algorithm  1  in  each  iteration  since  the  matrix- 
vector  multiplication  involving  A  is  not  required  in  equation  (14).  However,  updating  uk+1  via  the 
formulation  of  Step  2  in  Algorithm  1  can  be  implemented  through  the  use  of  the  component-wise 
Gauss-Seidel  iteration  which  may  accelerate  the  rate  of  convergence  of  the  algorithm  and  therefore 
reduce  the  total  CPU-time  consumed.  The  efficiency  of  component- wise  Gauss-Seidel  iteration  has 
been  verified  in  [14,  15]. 


Algorithm  2  (A  variant  of  Algorithm  1  for  model  (BPe)) 

Initialization:  v°  G  Mm,  u°  €  Mn,  e  >  0,  a  >  0,  and  (3  >  0  with  ^  <  -pjp;  set  u"1  = 
v°  —  (Au°  —  d°), 
repeat  (k  >  0) 

Step  1:  uk+1  «-  proxi  |M|i  (uk  -  ^AT ( 2vk  -  v k~1)\ 


Step  2:  vk+1 


0, 


\\Auk+1+vk~  b\\2 
until  a  given  stopping  criteria  is  met 


1  - 


if  \\Auk+l  +  vk  -  b\\2  <  e- 
(Auk+1  +  vk  —  6),  otherwise. 


4  Numerical  Simulations 

This  section  is  devoted  to  showing  the  numerical  performance  of  the  proposed  algorithms  for  com¬ 
pressive  sampling.  We  use  the  algorithm  NESTA  [2]  as  a  comparison.  We  focus  on  sparse  signals 
with  various  dynamic  ranges  and  measurement  matrices  from  randomly  partial  discrete  cosine  trans¬ 
forms  (DCTs)  and  evaluate  performance  of  algorithms  in  terms  of  various  error  metrics,  speed,  and 
robustness  to  noise.  All  the  experiments  are  performed  in  Matlab  7.11  on  Thinkpad  T400  with 
Intel  Core  Duo  CPU  @2.26G,  3GB  RAM  on  Windows  Vista  Home  Basic  operating  system. 

We  begin  with  a  description  of  generating  the  mxn  sensing  matrix  A  and  length-n  and  s-sparse 
signals.  In  each  experimental  trial,  the  mxn  sensing  matrix  A  is  generated  by  randomly  picking  m 
rows  from  the  nxn  DCT  matrix.  Sparse  signals  u  used  in  our  experiments  are  generated  according 
to  [2],  In  each  experimental  trial,  a  length-n,  s-sparse  signal  (a  signal  having  exactly  s  nonzero 
components),  is  generated  in  such  a  way  that  non-zero  components  are  given  by 

r?ilO0’72,  (15) 

where  ij i  =  ±1  with  probability  1/2  and  7j2  is  uniformly  distributed  in  [0,1].  The  locations  of 
the  nonzero  components  are  randomly  permuted.  Clearly,  the  range  of  the  magnitude  of  nonzero 
components  of  an  s-sparse  signal  is  [0, 10e]  with  the  parameter  6  controlling  this  dynamic  range. 
An  observed  signal  (data)  is  collected  by  b  =  Au  +  z.  where  2  represents  a  Gaussian  noise. 

The  accuracy  of  a  solution  obtained  from  a  specific  algorithm  is  quantified  by  the  relative 
^2-error,  the  relative  ^i-error,  and  the  absolute  foo-error  defined,  respectively,  as  follows: 

||rt  -  Uoll/IMI,  IIMli  -  IKHil/IHli,  ||w-ito||oo,  (16) 

where  u  is  the  true  data  and  is  the  restored  data.  All  results  reported  in  the  tables  of  this 
section  are  the  means  and  the  standard  deviations  of  these  relative  errors  from  simulations  that 
were  performed  20  trials. 

To  use  Algorithm  2,  one  needs  to  fix  the  parameters  a  and  /3  such  that  /3/a  <  1/||A||2  (see 
Theorem  3.2).  From  Step  1  of  the  algorithm,  the  ratio  /3/a  plays  a  role  of  step-size  of  changing  uk . 
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Therefore,  we  keep  this  ratio  as  big  as  possible  and  set 


P 


0.999 


(17) 


in  our  numerical  experiments.  In  such  the  way,  a  is  essentially  the  only  parameter  that  needs  to  be 
determined.  We  now  investigate  the  impact  of  the  parameter  a  on  the  performance  of  Algorithm  2. 

To  investigate  the  impact  of  varying  the  parameter  a  on  the  performance  of  Algorithm  2,  we 
consider  the  configuration  of  n  =  215,  m  =  nj 2,  s  =  0.05  and  the  dynamic  range  parameter  9  =  5. 
The  observed  data  is  noise  free.  Six  different  values  of  a,  namely,  0.0025,  0.005,  0.01,  0.02,  0.04, 
and  0.08,  are  tested.  Figure  4.1(a)  depicts  the  traces  of  the  relative  ^i-error  (see  (16))  against  the 
number  of  iterations  for  each  a.  As  it  can  be  seen  from  this  figure,  for  a  =  0.0025,  the  smallest 
value  in  our  test,  the  relative  £i-error  drops  rapidly  from  1  to  10~4,  stabilizes  with  insignificant 
changes  for  about  1200  iterations  and  then  quickly  drops  again  to  the  level  of  10~16.  When  a 
increases  from  0.0025  to  0.08,  the  number  of  iterations  required  for  the  relative  ^i-error  dropping 
from  1  to  10“4  increases.  Meanwhile,  the  numbers  of  iterations  for  the  transitions  from  the  first 
sharp  jump  region  to  the  second  one  decrease.  For  example,  it  is  about  700  for  a  =  0.005  and  only 
few  iterations  for  a  =  0.08.  These  observations  motivate  us  to  extend  Algorithm  2  to  a  scenario  in 
which  the  parameter  a  can  be  updated  during  the  iteration  with  the  goal  of  reducing  the  number 
of  iterations.  The  proposed  approach  is  rather  simple.  It  begins  with  a  relative  small  a  and  then 
increases  it  for  every  given  amount  of  iterations.  A  detailed  flow  of  this  new  approach  is  given  in 
Algorithm  3. 


(a)  (b) 

Figure  4.1:  The  relative  t\  error  (the  vertical  axis  with  a  base  10  logarithmic  scale)  versus  number 
of  iterations  (the  horizontal  axis),  (a)  Convergence  of  Algorithm  2  with  different  values  of  cc;  (b) 
Convergence  of  Algorithm  3. 


Three  new  parameters  introduced  in  Algorithm  3  are  integers  p  >  0,  r  >  1,  and  T  >  0. 
The  parameter  T  is  the  allowable  maximum  number  of  updating  the  parameters  a  and  (3 .  For 
each  update,  the  pair  (a,  (3)  will  change  to  ( Ta,(3/r )  that  will  keep  the  ratio  (3/a  unchanged.  The 
parameter  p  is  to  indicate  that  the  underlying  algorithm  with  a  pair  (a,  (3)  will  iterate  p  times  before 
the  algorithm  with  the  pair  (ra,/3/r)  runs  another  p  times.  We  now  demonstrate  the  efficiency  of 
varying  the  parameters  a  and  /3  via  applying  Algorithm  3  for  the  same  data  used  in  Figure  4.1(a). 
We  set  T  =  6,  r  =  4,  and  p  =  20  and  initialize  a  =  TrjpT^jj — •  Again,  we  choose  (3  by  using 
(17).  The  corresponding  result  is  shown  in  Figure  4.1(b).  It  is  clear  to  see  that  it  takes  about 
200  iterations  to  drop  the  relative  error  down  below  10~14.  Hence  the  strategy  of  updating  the 
parameters  a  and  (3  as  described  in  Algorithm  3  is  reasonable. 


Algorithm  3  (A  variant  of  Algorithm  1  for  model  (BPe)) 

Given:  integers  p  >  0,  r  >  1,  and  T  >  0;  e  >  0 

Initialization:  v°  G  Mm,  u°  G  Mn,  a  >  0,  and  (5  >  0  with  ^  <  -pp-;  set  v~* 2  =  v°  —  ( Au°  —  d°) 
repeat  (A:  >  0) 

Step  1:  Compute  uk+1  using  Step  1  of  Algorithm  2 
Step  2:  Compute  vk+1  using  Step  2  of  Algorithm  2 

Step  3:  If  A;  is  a  multiple  of  p  and  the  number  of  changing  the  parameters  a  and  /3  does  not 
exceed  T,  update  a  <—  ra,  /3  «—  ^ 
until  a  given  stopping  criteria  is  met 


The  rest  of  this  section  consists  of  two  parts.  Part  one  contains  the  results  for  (BP)  while  part 
two  contains  the  results  for  (BPe). 

In  part  one,  we  compare  the  performance  of  Algorithm  3  with  that  of  the  NESTA  [2],  The 

algorithm  NESTA  was  developed  by  applying  a  smoothing  technique  for  the  nonsmooth  t\ -norm 

and  an  accelerated  first-order  scheme,  both  from  Nesterov’s  work  [19].  A  parameter  denoted  by 
p  is  used  to  control  how  close  the  smoothed  ^i-norm  to  the  t\  -norm  will  be.  Two  different  levels 
of  p,  namely,  p  =  10~2 *  and  p  =  10“ '  ,  are  used  for  the  NESTA  in  experiments.  A  parameter 

5  for  tolerance  in  the  NESTA  varies  for  different  values  of  the  smoothing  parameter  p  and  needs 

to  be  determined.  We  finally  choose  5  =  10~8 * * *  for  p  =  10-2  and  6  =  10~12  for  p  =  10“ '  since 

such  choices  lead  to  reasonable  results.  For  Algorithm  3,  we  set  p  =  20  and  T  to  be  the  smallest 

integer  that  is  greater  than  log10(^ ||AT6||00).  In  our  experiments,  we  notice  that  ^■||AT6||00  is 

approximately  equal  to  10e .  As  a  result,  T  is  about  6  +  1.  The  stopping  criterion  of  Algorithm  3 

is  that  the  relative  errors  between  the  successive  iterates  of  the  reconstructed  signal  should  satisfy 

the  inequality  ||ufc+1  —  refe||/||refc||  <  10-15. 

In  our  experiments  for  problem  (BP),  the  dimensions  n  and  the  dynamic  ranges  6  of  the  unknown 

signals  are  chosen  from  {213 *, 215 *, 217}  and  {1,3,5},  respectively.  The  number  of  nonzero  entries  s 
is  set  to  be  0.05n,  0.02n,  O.Olre  respectively  for  the  number  of  measurements  m  =  re/2,  re/4,  re/8. 

The  left-column  of  Figure  4.2  shows  the  results  of  Algorithm  3  and  the  NESTA  when  the  dimension 
of  the  tested  signals  re  is  21'  and  the  number  of  measurements  rre  is  re/4.  The  symbols  ‘O’, 

and  ‘V’  denote  the  results  produced  by  Algorithm  3,  the  NESTA  with  p  =  ICC2,  and  the  NESTA 
with  p  =  ICC',  respectively.  The  colors  ‘red’,  ‘blue’,  and  ‘yellow’  represent  the  dynamic  ranges  of 
the  tested  signals  with  6  being  1,  3,  and  5,  respectively.  Three  error  metrics,  namely  the  relative 
^2-error,  t^-error,  O^-error,  are  respectively  displayed  with  a  base  10  logarithmic  scale  plot  for  the 
vertical  axis.  We  see  clearly  that  Algorithm  3  is  significantly  better  than  the  NESTA  in  terms 
of  these  error  metrics  for  the  unknown  signals  with  various  dynamic  ranges  and  the  CPU  time 
consumed.  We  also  observe  that  the  relative  I^-error  and  ^i-error  of  the  results  recovered  by 
Algorithm  3  along  with  the  CPU  time  consumed  are  quite  robust  with  respect  to  the  dynamic 
ranges  of  the  unknown  signals.  The  same  conclusions  can  be  drown  for  the  results  with  rre  =  re/2 

and  m  =  re/8  as  well. 

Part  two  presents  results  for  problem  (BPe).  The  settings  of  dimension,  sparsity,  and  dynamic 
range  of  unknown  signals  for  problem  (BPe)  are  the  same  as  those  for  problem  (BP).  The  only 
difference  is  that  measurements  in  part  two  are  contaminated  with  noise.  In  our  experiments, 
noise  levels  in  the  measurements  vary  with  the  dynamic  ranges  of  the  unknown  signals.  More 
precisely,  the  noise  levels  cr  are  set  to  be  0.05,  1.0,  and  5.0  corresponding  to  choices  1,  3,  and  5  of 
the  dynamic  range  parameter  6,  respectively.  It  turns  out  that  the  noise  power  is  e2  =  mcr2.  The 
stopping  criteria  for  our  algorithm  is  that  the  relative  errors  between  the  successive  iterates  of  the 
reconstructed  signal  should  satisfy  the  inequality  ||refc+1  —  refc||/||ufe||  <  10~5 * *.  For  the  smoothing 
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Figure  4.2:  The  relative  £\,  £ 2 ,  and  l oo  errors  (the  vertical  axis  with  base  10  logarithmic  scale) 
versus  the  CPU  time  consumed  (the  horizontal  axis)  for  (the  left-column)  the  noise  free  case  and 
(the  right-column)  the  noise  case.  The  colors  ‘red’,  ‘blue’,  and  ‘yellow’  represent  the  dynamic  ranges 
of  the  tested  signals  with  6  being  1,  3,  and  5,  respectively. 
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parameter  p  in  the  NESTA,  we  choose  the  default  setting  p  =  max{0.1cj,  0.01}.  The  right  column 
of  Figure  4.2  shows  the  results  of  Algorithm  3  and  the  NESTA  when  the  dimension  of  the  tested 
signals  n  is  21'  and  the  number  of  measurements  m  is  n/4.  We  see  that  the  accuracy  of  our 
algorithm  is  better  than  that  of  the  NESTA  in  terms  of  relative  terror  and  relative  £oo-error. 
There  is  up  to  1  order  of  improvement  of  accuracy  regarding  the  relative  td-error  with  Algorithm  3. 
Regarding  the  consumed  CPU  time,  Algorithm  3  consumes  at  most  half  of  the  CPU  time  that  the 
NESTA  does.  Again,  the  same  conclusions  can  be  drown  for  the  results  with  m  =  n/2  and  m  =  n/ 8 
as  well. 

5  Conclusions 

We  reformulated  the  £i-norm  minimization  problems  (BP)  and  (BPe)  via  indicator  functions  as 
unconstrained  minimization  problems.  The  objective  function  for  each  unconstrained  problem  is 
the  sum  of  the  Id-norm  of  the  underlying  signal  u  and  the  indicator  function  of  a  set  in  Rm,  which 
is  {0}  for  (BP)  or  the  e-ball  for  (BPe),  composing  with  the  affine  transformation  Au  —  b.  Due  to 
the  structure  of  this  objective  function  and  the  availability  of  the  explicit  forms  of  the  proximity 
operators  for  both  the  £ i -norm  and  the  indicator  function,  an  accurate  and  efficient  algorithm  is 
developed  for  recovering  sparse  signals  based  on  fixed-point  equation.  The  algorithm  outperforms 
the  state-of-the-art  algorithm  NESTA  in  terms  of  the  relative  £2,  the  relative  i\ ,  and  the  absolute 
t'oo  error  measures  as  well  as  the  CPU  time  for  tested  signals  ranging  from  a  low  dynamic  range  to 
a  high  dynamic  range  with  different  sizes. 
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